HDDS-2642. Expose decommission / maintenance metrics via JMX #3781

neils-dev · 2022-09-27T03:40:31Z

What changes were proposed in this pull request?

To expose metrics from nodes entering the decommissioning and maintenance workflow to JMX and prom endpoint. These metrics expose the number of datanodes in the workflow, the container replication state of tracked nodes and the number of pipelines waiting to close of tracked nodes. With the following exposed metrics from the NodeDecommissionManager through the DataAdminMonitorImpl the progress of the decommission and maintenance workflow can be monitored.

The progress of datanodes going though the workflow are monitored through aggregated counts of the number of tracked nodes, their number of pipelines waiting to close and the number of containers in each of sufficiently, under-replicated and unhealthy state. The metrics collected are as discussed in the associated Jira comments,

As exposed to prom endpoint:

aggregated total number of datanodes in workflow:
node_decommission_metrics_total_tracked_decommissioning_maintenance_nodes

Of tracked datanodes in workflow, the container replication state; total number of containers in each of sufficiently replicated, under-replicated and unhealthy state

node_decommission_metrics_total_tracked_containers_sufficiently_replicated
node_decommission_metrics_total_tracked_containers_under_replicated
node_decommission_metrics_total_tracked_containers_unhealthy

Of tracked datanodes in workflow, the aggregated number of pipelines waiting to close
node_decommission_metrics_total_tracked_pipelines_waiting_to_close

And, the number of datanodes in the workflow that are taken out and recommissioned.
node_decommission_metrics_total_tracked_recommission_nodes

Similarly exposed via JMX:

 {
    "name" : "Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics",
    "modelerType" : "NodeDecommissionMetrics",
    "tag.Hostname" : "e68cfe1f098e",
    "TotalTrackedDecommissioningMaintenanceNodes" : 0,
    "TotalTrackedRecommissionNodes" : 0,
    "TotalTrackedPipelinesWaitingToClose" : 0,
    "TotalTrackedContainersUnderReplicated" : 0,
    "TotalTrackedContainersUnhealthy" : 0,
    "TotalTrackedContainersSufficientlyReplicated" : 0
  }

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-2642

How was this tested?

Unit tests, CI workflow and manually tested with dev docker-cluster entering nodes in decommissioning workflow monitoring metrics collected in prom endpoint.

Unit tests:

hadoop-hdds/server-scm$ mvn -Dtest=TestNodeDecommissionMetrics test

INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdds.scm.node.TestNodeDecommissionMetrics
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.072 s - in org.apache.hadoop.hdds.scm.node.TestNodeDecommissionMetrics
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0
[INFO]

Manual testing via dev docker-cluster:
modify the docker-config for scm serviceid and serviceid-address:
hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/compose/ozone$
OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm
OZONE-SITE.XML_ozone.scm.address.scmservice.scm=scm

set docker-compose for monitoring with prometheus:
export COMPOSE_FILE=docker-compose.yaml:monitoring.yaml
hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/compose/ozone$ docker-compose up -d --scale datanode=3

view metrics through prom endpoint : http://localhost:9090
Decomission datanode from scm bash prompt:
$ ozone admin datanode decommission -id=scmservice --scm=172.26.0.3:9894 3224625960ec

…X and prom endpoints. Metrics object NodeDecommissionMetrics and associated unit tests inc. Preliminary integration into decommisioning workflow monitioring - see diff for details.

…t. Refactored monitor unit test utils for common use between unit tests for monitor and unit test for decommisioning progress metrics.

…ecommissioning_maintenance_nodes in DataAdminMonitorImpl as it was not captured properly by prom endpoint.

…tric for getting unhealthy container metric for nodes in decommissioning and maintenance workflow.

.

sodonnel · 2022-09-30T16:03:21Z

The changes look good, but I think it would be much more useful if we could track metric at the decommissioning node level too. Ie:

TotalTrackedContainersUnderReplicatedForHostname = xyz

I had a look at the ReplicationManagerMetric class, and in there, is an example of how to form a metric "on the fly" using:

private static final MetricsInfo INFLIGHT_REPLICATION = Interns.info(
      "InflightReplication",
      "Tracked inflight container replication requests.");

I think it should be possible store the counts per hostname in a map or list, and then when the metrics are snapshot, form dynamic metric names for the host level under / over / unhealthy container counts.

Also keep the aggregate metrics. These host level metrics would let people see if one host is stuck or if all are making progress etc.

kerneltime · 2022-09-30T17:46:05Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeDecommissionMetrics.java

+                .class.getSimpleName();
+
+  @Metric("Number of nodes tracked for decommissioning and maintenance.")
+  private MutableGaugeLong totalTrackedDecommissioningMaintenanceNodes;


The metric names should be renamed to follow standard conventions (we don't it everywhere in our code, but we should for new metrics). This way, when searching for metrics, we can first filter for what entity and then for which metric. Applicable to all metrics here.

Suggested change

private MutableGaugeLong totalTrackedDecommissioningMaintenanceNodes;

private MutableGaugeLong trackedDecommissioningMaintenanceNodesTotal;

Thanks @kerneltime for your review of this PR and for your comments. I've pushed changes in latest commit for naming the metrics in that manner for all metrics collected. ie. trackedDecommissioningMaintenanceNodesTotal.

sodonnel · 2022-09-30T19:49:24Z

HDFS has a metric like this:

 "DecomNodes" : "{\"cdh-6x-of-1.cdh-6x-of.root.hwx.site:20002\":{\"xferaddr\":\"172.27.52.133:20002\",\"underReplicatedBlocks\":0,\"decommissionOnlyReplicas\":0,\"underReplicateInOpenFiles\":0}}",

It seems to register a MBean instance in the FSNameSystem class. Then it has a few places it provides these JSON key values in the metrics.

…maintenance workflow by HOST. Added tests for host based metrics monitoring. Also added name changes to metrics as per reviewer comments.

neils-dev · 2022-10-14T23:49:37Z

Thanks @sodonnel for your help to expose the decommission / maintenance metrics for monitoring the workflow. As you suggested, I've added metrics to monitor the workflow progress by host. These host based metrics are created dynamically and track the pipeline and container state for datanodes going through the decommissioning and maintenance workflow.

The metrics include,
node_decommission_metrics_tracked_pipelines_waiting_to_close_ozone_datanode_3_ozone_default
node_decommission_metrics_tracked_sufficiently_replicated_ozone_datanode_3_ozone_default
node_decommission_metrics_tracked_unhealthy_containers_localhost
node_decommission_metrics_tracked_under_replicated_containers_localhost
(hostname marked in bold)

neils-dev · 2022-10-17T21:48:15Z

JMX and prom metrics collected for monitoring decommissioning and maintenance mode workflow updated to included collecting metrics by host. Examples of metrics captured follow.
JMX sample with hosts ozone-datanode-3.ozone_default and ozone-datanode-1.ozone_default :

    "modelerType" : "NodeDecommissionMetrics",
    "tag.Context" : "ozone",
    "tag.Hostname" : "e8772c7d7c4c",
    "TrackedDecommissioningMaintenanceNodesTotal" : 1,
    "TrackedRecommissionNodesTotal" : 0,
    "TrackedPipelinesWaitingToCloseTotal" : 2,
    "TrackedContainersUnderReplicatedTotal" : 0,
    "TrackedContainersUnhealthyTotal" : 0,
    "TrackedContainersSufficientlyReplicatedTotal" : 0,
    "TrackedContainersSufficientlyReplicatedTotal" : 0,
    "TrackedContainersUnderReplicatedTotal" : 0,
    "TrackedContainersUnhealthyTotal" : 0,
    "TrackedDecommissioningMaintenanceNodesTotal" : 1,
    "TrackedPipelinesWaitingToCloseTotal" : 2,
    "TrackedRecommissionNodesTotal" : 0,
    "trackedPipelinesWaitingToClose-ozone-datanode-3.ozone_default" : 0,
    "trackedSufficientlyReplicated-ozone-datanode-3.ozone_default" : 0,
    "trackedUnderReplicated-ozone-datanode-3.ozone_default" : 0,
    "trackedUnhealthyContainers-ozone-datanode-3.ozone_default" : 0,
    "trackedPipelinesWaitingToClose-ozone-datanode-1.ozone_default" : 2
  }, {

And, prom endpoint capturing decommissioning datanodes 3 and 1 in captured image:

… to ensure metrics are refreshed in timely manner.

kerneltime · 2022-10-17T22:52:13Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeDecommissionMetrics.java

+  private MutableGaugeLong trackedDecommissioningMaintenanceNodesTotal;
+
+  @Metric("Number of nodes tracked for recommissioning.")
+  private MutableGaugeLong trackedRecommissionNodesTotal;


Is tracked a bit superfluous here? These can be RecommissionNodesTotal unless the prefix tracked is adding more context here.

The significance of tracked is coming from the convention in the monitor code which tracks the nodes in the decommissioning and maintenance workflow -
DatanodeAdminMonitorImpl.java javadoc: Once an node is placed into tracked nodes, it goes through a workflow where
the following happens:
and the corresponding log outputs,
INFO node.DatanodeAdminMonitorImpl: There are 1 nodes tracked for decommission and maintenance.
Would it be better to continue to use the prefix tracked, remove altogether, or perhaps another keyword uniform to the decommissioning/maintenance mode metrics to easily search for with jmx and prometheus?

I think we can drop tracked and improve the readability when visualizing

Thanks @kerneltime. Changes pushed see: #3781 (comment)

sodonnel · 2022-10-18T15:39:11Z

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java

  private long underReplicatedContainers = 0;

+  @SuppressFBWarnings(value = "SIC_INNER_SHOULD_BE_STATIC")
+  private final class ContainerStateInWorkflow {


Rather than suppress the FB warning, can this class be made private final static class ...? I am not an expert in this area, but usually inner classes I've seen are static. The difference between static and non static inner classes seems to be that "non static" inner classes can directly access the enclosing classes instance variables and methods.

A static inner class cannot directly access the enclosing classes methods. It has to do it via an object reference.

In this case, the inner class is a simple wrapper around a set of variables and does not need to access the enclosing methods class, and therefore can be static I think.

Thanks. Removed the suppress annotation and properly converted instead to static final nested class from the final inner class. As the inner class does not refer to the outer class instance, it should indeed be a static nested class.

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java

sodonnel · 2022-10-18T15:51:13Z

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java

+    }
+
+    public void setAll(long sufficiently,
+                    long under,


Formatting here seems off again - should either be 4 spaces in from the line above or aligned with the other parameters.

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeDecommissionMetrics.java

…inner class to private final static nested class. Code formatting indentation corrections not picked up by checkstyle.

sodonnel · 2022-10-19T10:14:52Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeDecommissionMetrics.java

+
+  @VisibleForTesting
+  public Long getTrackedPipelinesWaitingToCloseByHost(String host) {
+    if (!trackedPipelinesWaitingToCloseByHost.containsKey(host)) {


These get methods could all be simplified to something like:

private static final MutableGaugeLong ZERO_GAUGE = new MutableGaugeLong; return trackedPipelinesWaitingToCloseByHost.getOrDefault(host, ZERO_GAUGE).value()

getters from metrics by host cleaned up in latest commit. getTrackedPipelinesWaitingToCloseByHost , please check.

sodonnel · 2022-10-19T10:16:44Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeDecommissionMetrics.java

+                                                       long num) {
+    trackedPipelinesWaitingToCloseByHost.computeIfAbsent(host,
+        hostID -> registry.newGauge(
+            Interns.info("trackedPipelinesWaitingToClose-" + hostID,


This block is repeated 4 times with just the name and value changing:

Interns.info("trackedSufficientlyReplicated-" + hostID, "Number of sufficiently replicated containers " + "for host in decommissioning and " + "maintenance mode"), 0L)).set(sufficientlyReplicated);

Could you move it into a private method to reduce the duplicated code?

Thanks. Moved to new private method :
private TrackedWorkflowContainerState createContainerMetricsInfo

sodonnel · 2022-10-19T10:19:55Z

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java

    this.replicationManager = replicationManager;
+
+    containerStateByHost = new HashMap<>();
+    pipelinesWaitingToCloseByHost = new HashMap<>();


Why split the pipelines into a seperate map? It looks like it would be easier overall to have a pipeline count setter on the ContainerStateInWorkflow object and just carry the pipeline count around with the containers etc too?

Split between replication state and pipelines was for grouping - they are initialized and set in separate parts of the monitor code that resulted in using two separate maps to store the two. Looking to, as suggested, reuse the ContainerStateInWorkflow for the two, perhaps two different setters; one for the replication and the other for pipelines.

Combined all metrics collected by host in monitor to ContainerStateInWorkflow as suggested.

sodonnel · 2022-10-19T10:30:17Z

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java

+      metrics.metricRecordPipelineWaitingToCloseByHost(e.getKey(),
+              e.getValue());
+    }
+    for (Map.Entry<String, ContainerStateInWorkflow> e :


I might be wrong, but I think there is a bug here.

Lets say we put a host to maintenance. It will have some metrics tracked in the ByHost maps.

After each pass we reset these maps to have zero counts, but we don't remove the entries from the maps anywhere (unless I have missed it). Then we update the values accordingly.

Later the node goes back into service and even though it is removed from the monitor, it will be tracked with zero counts forever.

Over time on a long running cluster, we will build up a lot of "by host" metrics with zero values, when they really should be removed.

I think the reset will need to remove them from the maps rather than zeroing them, and also when setting the values to the metric gauge, you will need to remove values no longer there from it too.

It might be easier to pass a Map<String, ContainerStateInWorkflow> to the metrics class to facilitate removing the stale entries.

@sodonnel , with the metrics registry it appears that the metrics we track remain in the registry. With this behavior, currently each datanode we add to track remains unless we have an api to remove it from the MetricsRegistry. Is there a way to delete/remove a gauge from the registry? See MetricsRegistry.java https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/MetricsRegistry.java#L40.

Huum, looks like you are correct. I wonder what the best approach is here.

I don't think its a great user experience if we start with no individual nodes track, and then over time (in a long running SCM) more and more nodes get added for maintenance and decommission and the number builds up all with zero counts. I guess its not a major problem, but it would be nice to resolve it somehow.

In #3791 Symious added a tag with a group of metrics in JSON form. For the metrics system, this is just a tag to string, rather than a gauge, but we could group all currently decommissioning / maintenence nodes into a JSON representation to expose the fine grained info. If no nodes are in the workflow, it would just be an empty json object, so nodes can come and go easily.

Then you still have your aggregate metrics as they are now.

It is unlikely that someone would want to chart an individual DN as they would have to create a new chart for each DN.

What do you think?

I've modified the code to dynamically (without using the helper MetricsRegistry class to add gauges) add to the collector as is done similarly in namenode topmetrics collections. See https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/top/metrics/TopMetrics.java#L167.
Here the metrics are collected dynamically by host when the host node is in the workflow. When the node exits the workflow, the metrics for that host are no longer collected. In the JMX, the node metrics are no longer in the output. Note this is true for JMX, the prom endpoint seems to retain the last value pushed. See the following metrics pushed out to JMX for the NodeDecommissionMetrics when datanode-2 is decommissioned:

before "name" : "Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics", "modelerType" : "NodeDecommissionMetrics", "tag.Hostname" : "0d207b6cbbf1", "TrackedDecommissioningMaintenanceNodesTotal" : 0, "TrackedRecommissionNodesTotal" : 0, "TrackedPipelinesWaitingToCloseTotal" : 0, "TrackedContainersUnderReplicatedTotal" : 0, "TrackedContainersUnhealthyTotal" : 0, "TrackedContainersSufficientlyReplicatedTotal" : 0 }, { during "name" : "Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics", "modelerType" : "NodeDecommissionMetrics", "tag.Hostname" : "0d207b6cbbf1", "TrackedDecommissioningMaintenanceNodesTotal" : 1, "TrackedRecommissionNodesTotal" : 0, "TrackedPipelinesWaitingToCloseTotal" : 2, "TrackedContainersUnderReplicatedTotal" : 0, "TrackedContainersUnhealthyTotal" : 0, "TrackedContainersSufficientlyReplicatedTotal" : 0, "TrackedUnhealthyContainers-ozone-datanode-2.ozone_default" : 0, "TrackedSufficientlyReplicated-ozone-datanode-2.ozone_default" : 0, "TrackedPipelinesWaitingToClose-ozone-datanode-2.ozone_default" : 2, "TrackedUnderReplicated-ozone-datanode-2.ozone_default" : 0 }, { }, { "name" : "Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics", "modelerType" : "NodeDecommissionMetrics", "tag.Hostname" : "0d207b6cbbf1", "TrackedDecommissioningMaintenanceNodesTotal" : 1, "TrackedRecommissionNodesTotal" : 0, "TrackedPipelinesWaitingToCloseTotal" : 0, "TrackedContainersUnderReplicatedTotal" : 1, "TrackedContainersUnhealthyTotal" : 0, "TrackedContainersSufficientlyReplicatedTotal" : 0, "TrackedUnhealthyContainers-ozone-datanode-2.ozone_default" : 0, "TrackedSufficientlyReplicated-ozone-datanode-2.ozone_default" : 0, "TrackedPipelinesWaitingToClose-ozone-datanode-2.ozone_default" : 0, "TrackedUnderReplicated-ozone-datanode-2.ozone_default" : 1 }, { after }, { "name" : "Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics", "modelerType" : "NodeDecommissionMetrics", "tag.Hostname" : "0d207b6cbbf1", "TrackedDecommissioningMaintenanceNodesTotal" : 0, "TrackedRecommissionNodesTotal" : 0, "TrackedPipelinesWaitingToCloseTotal" : 0, "TrackedContainersUnderReplicatedTotal" : 0, "TrackedContainersUnhealthyTotal" : 0, "TrackedContainersSufficientlyReplicatedTotal" : 0 }, {

The host datanode-2 metrics no longer visible as the node exits the workflow.

This seems to follow how hadoop handles metrics collected dynamically, however the prom endpoint seems to retain the last pushed value for some reason. Is this what we should expect when collecting metrics for hosts as they go in and out of the workflow?

I'm not sure how the prom end point works. Its not ideal that it keeps the last value pushed, but I am not sure where that code even comes from!

Thanks. We should go forward with using this implementation that works for JMX metrics for completing this PR to expose decommission / maintenance metrics via JMX and open a new jira to look into supporting the prom endpoint. This PR supports metrics tracking the decommission and maintenance workflow both with aggregated counts and DN host specific counts. A jira will be filed to track prom endpoint behavior for the metrics. What do you think?

sodonnel · 2022-10-19T10:32:50Z

Thanks for updating the formatting. I have a few more comments around code reuse and also a possible bug to check on. Also a conflict has appeared against one class which needs resolved.

sodonnel · 2022-10-19T10:41:34Z

One other thing I spotted:

    "TrackedPipelinesWaitingToCloseTotal" : 2,
    "TrackedRecommissionNodesTotal" : 0,
    "trackedPipelinesWaitingToClose-ozone-datanode-3.ozone_default" : 0,
    "trackedSufficientlyReplicated-ozone-datanode-3.ozone_default" : 0,
    "trackedUnderReplicated-ozone-datanode-3.ozone_default" : 0,
    "trackedUnhealthyContainers-ozone-datanode-3.ozone_default" : 0,

The host level metrics start with a lower case letter, but the others start with upper case. We should be consistent here inline with what other metrics do.

…bject and for consistent snapshot pulled by metricsSystem from metrics object. Cleanup of metrics reset and refinement to metrics gathered by host to follow.

neils-dev · 2022-10-28T19:27:49Z

some changes to make it simpler and fix the synchronization issue

Thanks. Pushed similar changes for simplification and fix for synchronization issue in latest commit after revert. Some minor cleanup and changes to follow.

…aintenance mode metrics collected by HOST to be a metricRecord with tag associated with gauge. With change, allows a single metric name for all host (node) based container state metrics. Within the single name, ie. TrackedSufficientlyReplicatedDN is a tag displayed that identifies the host node for the metric. Works for both JMX and can be displayed graphically/table form by Prometheus.

neils-dev · 2022-10-29T02:08:42Z

Pushed new changes that finish the clean up for the metrics reset and collection in the monitor. In addition the metric in the NodeDecommisionMetrics changed for container state metrics per node in the workflow.

Now, a single metric name is used for the same metric collected for each datanode, within the metric is an associated tag that identifies the node for the metric reading, ie. `node_decommission_metrics_tracked_sufficiently_replicated_dn{datanode="ozone-datanode-2.ozone_default",hostname="39160451dea0"}.

The decommissioning / maintenance workflow is tracked by JMX displaying each aggregated metric and displaying the node container state metrics only when the node is in the workflow. Prometheus now also displays each aggregated metric but now under a unique metric name for each container state metric, displays each host associated with the reading as a tag. This can be seen below in a workflow decommissioning datanode-3:

during:

    "name" : "Hadoop:service=StorageContainerManager,name=NodeDecommissionMetric
s",
    "modelerType" : "NodeDecommissionMetrics",
    "tag.Hostname" : "39160451dea0",
    "TrackedDecommissioningMaintenanceNodesTotal" : 1,
    "TrackedRecommissionNodesTotal" : 0,
    "TrackedPipelinesWaitingToCloseTotal" : 0,
    "TrackedContainersUnderReplicatedTotal" : 1,
    "TrackedContainersUnhealthyTotal" : 0,
    "TrackedContainersSufficientlyReplicatedTotal" : 0,
    "tag.datanode.1" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.1" : "39160451dea0",
    "TrackedUnderReplicatedDN.1" : 1,
    "tag.datanode.2" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.2" : "39160451dea0",
    "TrackedSufficientlyReplicatedDN.2" : 0,
    "tag.datanode.3" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.3" : "39160451dea0",
    "TrackedPipelinesWaitingToCloseDN.3" : 0,
    "tag.datanode.4" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.4" : "39160451dea0",
    "TrackedUnhealthyContainersDN.4" : 0
  }, {

after,

 }, {
    "name" : "Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics",
    "modelerType" : "NodeDecommissionMetrics",
    "tag.Hostname" : "39160451dea0",
    "TrackedDecommissioningMaintenanceNodesTotal" : 0,
    "TrackedRecommissionNodesTotal" : 0,
    "TrackedPipelinesWaitingToCloseTotal" : 0,
    "TrackedContainersUnderReplicatedTotal" : 0,
    "TrackedContainersUnhealthyTotal" : 0,
    "TrackedContainersSufficientlyReplicatedTotal" : 0
  }, {

and Prometheus, decommission datanode-2 (green) and datanode-3 (yellow):

neils-dev · 2022-10-29T02:10:56Z

Note filed jira for problem with the master branch for prometheus scraping scm metrics from prom endpoint. This affects our metrics monitoring with prom as well as the scm related metrics such as the scm_node_manager_decommission* metrics. HDDS-7437.

captainzmc · 2022-11-02T14:51:34Z

Hi @sodonnel, currently the release of Ozone-1.3 is blocked on this PR. Could you help continue to review this PR?

sodonnel · 2022-11-02T15:45:31Z

I'm out of the office this week. I will look at this again on Monday. I don't think this PR is essential for the 1.3 release as we have lived without these metrics up until now.

captainzmc · 2022-11-02T16:01:04Z

Thanks @sodonnel for the feedback.
Hi @neils-dev, I saw that you mentioned in the dev email that ozone 1.3 needs this patch. But I agree with Stephen. Current 1.3 release is more focused on the important bug fixes. So I'm going to remove this PR from the 1.3 block list, what do you think?

neils-dev · 2022-11-02T17:52:40Z

Hi @captainzmc , thanks for reaching out for this PR to be included in the 1.3 release. This patch is actually something that we would like to have in the 1.3 release. Such functionality is new and needed for our production environment. We would like to use this included in the 1.3 stable release. Please do continue to include this PR in 1.3 release blocked list.
@kerneltime, as discussed in our past community call, please do review the changes. @sodonnel please do so when you can.
Thanks!

michelsumbul · 2022-11-02T18:02:56Z

@captainzmc, having visibility on the decommissioning / maintenance process its something quite important from an operational point of view if you want to run a production cluster. Our operational team will be very reluctant to use ozone in prod without that as they will not know what's happening or not. Its probably a fundamental feature from an operational point of view. We want to use the 1.3 version as it will be widely use and not a master branch later. This is why we would like to have it included in the 1.3.
Moreover I believe we are not far to be ready, @neils-dev told me that all the comments have been addressed in this PR.

captainzmc · 2022-11-03T02:48:30Z

Thanks @neils-dev @michelsumbul for the feedback. I got you point. Sure. let's keep this PR in 1.3 release blocked list.

sodonnel · 2022-11-07T15:30:27Z

Now, a single metric name is used for the same metric collected for each datanode, within the metric is an associated tag that identifies the node for the metric reading, ie. `node_decommission_metrics_tracked_sufficiently_replicated_dn{datanode="ozone-datanode-2.ozone_default",hostname="39160451dea0"}.

~~I don't see how in the source code this tag gets set - can you point me at how the tag per DN is getting set please?~~

Sorry, github was showing me the wrong commit (or I was being stupid!). I see it now.

sodonnel · 2022-11-07T15:51:03Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeDecommissionMetrics.java

+    for (Map.Entry<String, ContainerStateInWorkflow> e :
+        containerStatesByHost.entrySet()) {
+      trackedWorkflowContainerMetricByHost
+          .computeIfAbsent(MetricByHost.SufficientlyReplicated


I don't think we need all the computIfAbsent calls here - we just cleared the map so they are always going to be absent, so just put the new value.

sodonnel · 2022-11-07T16:32:04Z

    "tag.datanode.1" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.1" : "39160451dea0",
    "TrackedUnderReplicatedDN.1" : 1,
    "tag.datanode.2" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.2" : "39160451dea0",
    "TrackedSufficientlyReplicatedDN.2" : 0,
    "tag.datanode.3" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.3" : "39160451dea0",
    "TrackedPipelinesWaitingToCloseDN.3" : 0,
    "tag.datanode.4" : "ozone-datanode-3.ozone_default",
    "tag.Hostname.4" : "39160451dea0",
    "TrackedUnhealthyContainersDN.4" : 0

Its a bit strange that we have 4 tags for the same DN. The way I'd expect this to work is we have 1 tag per DN, and then the 4 metrics (under, sufficiently, pipelines, unhealthy) all sharing that tag.

I think this is due to the way you are adding the tags in getMetrics - you are building a new tag per metric, rather than a tag per DN and then tagging all its metrics with that tag.

It might be easier if you simply stored a Set or Map of ContainerStateInWorkflow inside the metrics class, as then you can iterate it on a host by host basis.

Example of what I mean, plus more simplifications - sodonnel@673109c

…low inner class to encapsulate host metrics used in both metric object for metric publication and in monitor in separate instances for data collection. Cleaned up host metrics tags so that each host workflow metrics are tagged together under its datanode and hostname.

neils-dev · 2022-11-08T03:51:55Z

Latest push contains both simplification changes to NodeDecommissionMetrics and DatanodeAdminMonitorImpl to use the ContainerStateInWorkflow inner class and methods. Thanks for the suggestion.
Also included is a fix to the jmx by host metrics collected:

The way I'd expect this to work is we have 1 tag per DN, and then the 4 metrics (under, sufficiently, pipelines, unhealthy) all sharing that tag.

example with datanode 1 and datanode 3 in decommissioning workflow as captured by monitor:

    "TrackedPipelinesWaitingToCloseTotal" : 4,
    "TrackedContainersUnderReplicatedTotal" : 0,
    "TrackedContainersUnhealthyTotal" : 0,
    "TrackedContainersSufficientlyReplicatedTotal" : 0,
    "tag.datanode.1" : "ozone_datanode_1.ozone_default",
    "tag.Hostname.1" : "f27febaf1870",
    "TrackedPipelinesWaitingToCloseDN.1" : 2,
    "TrackedUnderReplicatedDN.1" : 0,
    "TrackedSufficientlyReplicatedDN.1" : 0,
    "TrackedUnhealthyContainersDN.1" : 0,
    "tag.datanode.2" : "ozone_datanode_3.ozone_default",
    "tag.Hostname.2" : "f27febaf1870",
    "TrackedPipelinesWaitingToCloseDN.2" : 2,
    "TrackedUnderReplicatedDN.2" : 0,
    "TrackedSufficientlyReplicatedDN.2" : 0,
    "TrackedUnhealthyContainersDN.2" : 0

neils-dev · 2022-11-08T03:58:55Z

@sodonnel , for a possible fix for handling prometheus stale metrics, we should open a new jira to flush stale metrics on each refresh write in the PrometheusSink,
https://github.com/apache/ozone/blob/79a9d39da7f33e71bc00183e280105562354cca4/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/PrometheusMetricsSink.java#L134

The flush of old stale metrics and populating the internal PrometheusMetricsSink Map can be handled similar to a resolved HDFS jira, HADOOP-17804.

sodonnel · 2022-11-08T10:47:29Z

Yes it makes sense to open a new Jira for the prometheus issue.

The tagged metrics look better now - we have one tag per DN, which is what I would expect.

sodonnel

Changes look good now. Thanks for pushing through all the suggestions - I think its much cleaner now than when we started, and the "tag" idea for the metrics is good too - I had not seen that before.

@kerneltime has a suggestion on removing "tracked" from the metrics names to make them shorter - I am happy either way, but lets wait for him to comment before we commit.

neils-dev · 2022-11-08T16:53:28Z

Thanks @kerneltime , @sodonnel for the comment on removing the prefix "tracked" from metrics published through the NodeDecommisionMetrics. Sounds good. I've updated the code, now metrics are pushed as for jmx,

"DecommissioningMaintenanceNodesTotal" : 1,
"RecommissionNodesTotal" : 0,
"PipelinesWaitingToCloseTotal" : 1,
"ContainersUnderReplicatedTotal" : 0,
"ContainersUnhealthyTotal" : 0,
"ContainersSufficientlyReplicatedTotal" : 0,
"tag.datanode.1" : "ozone_datanode_2.ozone_default",
"tag.Hostname.1" : "30857068c05f",
"PipelinesWaitingToCloseDN.1" : 1,
"UnderReplicatedDN.1" : 0,
"SufficientlyReplicatedDN.1" : 0,
"UnhealthyContainersDN.1" : 0

and on the prom endpoint,

node_decommission_metrics_containers_under_replicated_total{hostname="30857068c05f"} 1
...
node_decommission_metrics_decommissioning_maintenance_nodes_total gauge
node_decommission_metrics_decommissioning_maintenance_nodes_total{hostname="30857068c05f"} 1

…ublished metrics. Affects metric gauge getter and setter methods.

sodonnel · 2022-11-08T17:06:10Z

Latest changes look good. Please have a check @kerneltime and we can commit if you are happy.

captainzmc · 2022-11-09T02:33:22Z

Thanks @neils-dev for the patch, and thanks @sodonnel @kerneltime for the review, let's merge this.

…3781) * Expose decommission / maintenance metrics via JMX

neils-dev · 2022-11-09T14:52:55Z

Thanks @sodonnel , @kerneltime , @captainzmc .

neils-dev and others added 7 commits July 26, 2021 18:41

Commit new workflow def for repeated test of HDDS-5358.

b432c98

Merge remote-tracking branch 'upstream/master'

48f27b0

Initial commit for Decommissioning progress monitoring metrics for JM…

110c612

…X and prom endpoints. Metrics object NodeDecommissionMetrics and associated unit tests inc. Preliminary integration into decommisioning workflow monitioring - see diff for details.

Integrated decommission progress metrics with monitor. Added unit tes…

8593546

…t. Refactored monitor unit test utils for common use between unit tests for monitor and unit test for decommisioning progress metrics.

Fixed metric collection for node_decommission_metrics_total_tracked_d…

526cd84

…ecommissioning_maintenance_nodes in DataAdminMonitorImpl as it was not captured properly by prom endpoint.

Cleanup and added code documentation. Minor bug fix for collecting me…

3faec5b

…tric for getting unhealthy container metric for nodes in decommissioning and maintenance workflow.

Trigger build

f4c9eb5

.

neils-dev requested a review from sodonnel September 28, 2022 17:18

swagle requested a review from fapifta September 30, 2022 15:16

kerneltime reviewed Sep 30, 2022

View reviewed changes

Added additional metrics for monitoring progress of decommission and …

5a22113

…maintenance workflow by HOST. Added tests for host based metrics monitoring. Also added name changes to metrics as per reviewer comments.

modified getMetrics method in NodeDecommissionMetrics for stability -…

ea64bfa

… to ensure metrics are refreshed in timely manner.

kerneltime reviewed Oct 17, 2022

View reviewed changes

sodonnel mentioned this pull request Oct 18, 2022

HDDS-6210. EC: Add EC metrics #3851

Merged

sodonnel reviewed Oct 18, 2022

View reviewed changes

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java Outdated Show resolved Hide resolved

sodonnel reviewed Oct 18, 2022

View reviewed changes

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java Outdated Show resolved Hide resolved

sodonnel reviewed Oct 18, 2022

View reviewed changes

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeDecommissionMetrics.java Outdated Show resolved Hide resolved

Minor fix to inner class in monitor code changing from private final …

a8de744

…inner class to private final static nested class. Code formatting indentation corrections not picked up by checkstyle.

sodonnel reviewed Oct 19, 2022

View reviewed changes

Added locking support for monitoring setting of snapshot to metrics o…

efc2554

…bject and for consistent snapshot pulled by metricsSystem from metrics object. Cleanup of metrics reset and refinement to metrics gathered by host to follow.

sodonnel reviewed Nov 7, 2022

View reviewed changes

sodonnel approved these changes Nov 8, 2022

View reviewed changes

Minor changes to remove prefix tracked from NodeDecommissionMetrics p…

75ad487

…ublished metrics. Affects metric gauge getter and setter methods.

kerneltime approved these changes Nov 8, 2022

View reviewed changes

captainzmc merged commit 436b4b2 into apache:master Nov 9, 2022

captainzmc pushed a commit to captainzmc/hadoop-ozone that referenced this pull request Nov 9, 2022

HDDS-2642. Expose decommission / maintenance metrics via JMX (apache#…

47411f9

…3781) * Expose decommission / maintenance metrics via JMX

xBis7 mentioned this pull request Dec 7, 2022

HDDS-7576. Prometheus metrics do not remove stale metrics until restart #4057

Merged

xBis7 mentioned this pull request Jan 3, 2023

HDDS-7721. Make OM Ratis roles available in /prom endpoint #4140

Merged

	private MutableGaugeLong totalTrackedDecommissioningMaintenanceNodes;
	private MutableGaugeLong trackedDecommissioningMaintenanceNodesTotal;

HDDS-2642. Expose decommission / maintenance metrics via JMX #3781

HDDS-2642. Expose decommission / maintenance metrics via JMX #3781

Uh oh!

Conversation

neils-dev commented Sep 27, 2022

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this tested?

Uh oh!

sodonnel commented Sep 30, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Sep 30, 2022

Uh oh!

neils-dev commented Oct 14, 2022

Uh oh!

neils-dev commented Oct 17, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Oct 19, 2022

Uh oh!

sodonnel commented Oct 19, 2022

Uh oh!

neils-dev commented Oct 28, 2022

Uh oh!

neils-dev commented Oct 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neils-dev commented Oct 29, 2022

Uh oh!

captainzmc commented Nov 2, 2022

neils-dev commented Oct 29, 2022 •

edited

Loading

captainzmc commented Nov 2, 2022 •

edited

Loading

michelsumbul commented Nov 2, 2022 •

edited

Loading

sodonnel commented Nov 7, 2022 •

edited

Loading